Robustness
Human shows great robustness when it comes to language comprehension. They can somehow understand a sentence even if it is just a bunch of words without grammar, or ignore wrong words or specify words that are so vague that they are no more than placeholders. So far NLP systems have not demonstrated such ability. From Croce et al. (2010)Croce, D., Giannone, C., Annesi, P., & Basili, R. (2010). Towards Open-Domain Semantic Role Labeling. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (July), 237–246. Retrieved from http://www.aclweb.org/anthology/P/P10/P10-1025.pdf: "Most of the employed learning algorithms are based on complex sets of syntagmatic features, as deeply investigated in (Johansson and Nugues, 2008b). The resulting recognition is thus highly dependent on the accuracy of the underlying parser, whereas wrong structures returned by the parser usually imply large misclassification errors." From Pradhan (2006)Pradhan, S. S. (2006). Robust Semantic Role Labeling.: "There is a significant drop in the precision and recall numbers for the AQUAINT test set 60-70% (compared to the precision and recall numbers for the PropBank test set which were 82% and 78% respectively)." TODO: PropBank --> Brown, etc. Syntactic parser robustness: Hashemi and Hwa (2016)Hashemi, Homa B., and Rebecca Hwa. 2016. "An Evaluation of Parser Robustness for Ungrammatical Sentences." EMNLP 2016., Foster (2004Foster, Jennifer. "Parsing Ungrammatical Input: an Evaluation Procedure." In LREC. 2004., 2005Foster, J., 2005. Good reasons for noting bad grammar: Empirical investigations into the parsing of ungrammatical written English. Trinity College., 2007Foster, Jennifer. "Treebanks gone bad." International Journal on Document Analysis and Recognition 10, no. 3 (2007): 129-145.) POS tagging robustness: Gadde et al. (2011)Gadde, Phani, L. V. Subramaniam, and Tanveer A. Faruquie. "Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results." In Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, p. 5. ACM, 2011. TODO: Maity et al. (2016)Maity, S., Chaudhary, A., Kumar, S., Mukherjee, A., Sarda, C., Patil, A. and Mondal, A., 2016, February. WASSUP? LOL: Characterizing Out-of-Vocabulary Words in Twitter. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (pp. 341-344). ACM. Methodological issues Should all NLP systems be tested for robustness? Gimenez and Marquez (2004)Giménez, J., Màrquez, L., & Marquez, L. (2004). Svmtool: A general pos tagger generator based on support vector machines. Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC’ 04, (December), 43–46. created a SVM-based POS tagger which uses discrete representation and can't handle out-of-vocabulary words. The tagger gets state-of-the-art results on WSJ but should it have been tested on other settings? How representative is WSJ for news text? For human language in general? Analyses Kummerfeld et al. (2012)Kummerfeld, J. K., Hall, D., Curran, J. R., & Klein, D. (2012). Parser Showdown at the Wall Street Corral : An Empirical Investigation of Error Types in Parser Output. In EMNLP 2012 (pp. 1048–1059). perform an interesting analysis of out-of-domain parsing (see Section 5.2). Modalities Internet language and learner language Reviews: Einsenstein (2013)Eisenstein, J., 2013, June. What to do about bad language on the internet. In HLT-NAACL (pp. 359-369)., Plank (2016)Plank, Barbara. "What to do about non-standard (or non-canonical) language in NLP." arXiv preprint arXiv:1608.07836 (2016).. Important: a quantitative analysis: Baldwin et al. (2013)Baldwin, Timothy, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. "How noisy social media text, how diffrnt social media sources?." In IJCNLP, pp. 356-364. 2013. POS tagging: Owoputi et al. (2013)Owoputi, Olutobi, Brendan O'Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. "Improved part-of-speech tagging for online conversational text with word clusters." Association for Computational Linguistics, 2013., Ma et al. (2014)Ma, Ji, Yue Zhang, and Jingbo Zhu. "Tagging The Web: Building A Robust Web Tagger with Neural Network." In ACL (1), pp. 144-154. 2014., Khan et al. (2013)Khan, Mohammad, Markus Dickinson, and Sandra Kübler. "Towards Domain Adaptation for Parsing Web Data." In RANLP, pp. 357-364. 2013. Relation extraction: Augenstein (2016)Augenstein, I., 2016. Web Relation Extraction with Distant Supervision (Doctoral dissertation, University of Sheffield). analysis of learner language (using Czech): Rosen (2016)Rosen, Alexandr. "Modeling non-standard language." GramLex 2016 (2016): 120. Comments and discussions: Foster et al. (2011)Foster, J., Wagner, J., Roux, J. Le, Nivre, J., Hogan, D., & Genabith, J. Van. (2011). From News to Comment : Resources and Benchmarks for Parsing the Language of Web 2 . 0. In IJCNLP 2011 (pp. 893–901). Short messages: SMS, micro-blogs and queries "Workshop on Machine Translation (WMT), recently devoted a shared task to this problem (Callison-Burch et al., 2011) consisting of text messages that were sent during the January 2010 earthquake in Haiti to an emergency response service. Participants were faced with a number of problems ranging from ‘text speak’ to the lack of punctuation (Eidelman et al., 2011)." "Since most user-generated content documents tend to be rather short, which applies in particular to micro-blogs, it is difficult to interpret them in isolation and it is often beneficial to contextualise them in order to facilitate further analysis. In many cases it is possible to link micro-blog messages to full documents such as news articles (Guo et al., 2013). Alternatively, one can group or cluster different micro-blog messages together according to hidden properties, for example representing demographic characteristics (Bergsma and Van Durme, 2013). NER: Derczynski and Bontcheva (2014)Derczynski, Leon, and Kalina Bontcheva. "Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets." In EACL, pp. 69-73. 2014., Derczynski et al. (2015)Derczynski, Leon, Isabelle Augenstein, and Kalina Bontcheva. "Usfd: Twitter ner with drift compensation and linked data." arXiv preprint arXiv:1511.03088 (2015)., Espinosa et al. (2016)Espinosa, Kurt Junshean, Riza Batista-Navarro, and Sophia Ananiadou. "Learning to recognise named entities in tweets by exploiting weakly labelled data." WNUT 2016 (2016): 153., Fromreide and Søgaard (2014)Fromreide, Hege, and Anders Søgaard. "NER in Tweets Using Bagging and a Small Crowdsourced Dataset." In International Conference on Natural Language Processing, pp. 45-51. Springer International Publishing, 2014., Onal and Karagoz (2015, for Turkish)Onal, Kezban Dilek, and Pinar Karagoz. "Named Entity Recognition from Scratch on Social Media." (2015)., Shulz (2014, for Dutch)Schulz, Sarah. "Named entity recognition for user-generated content." ESSLLI 2014 Student Session (2014): 207. Polarity detection: Fersini et al. (2016)Fersini, Elisabetta, Enza Messina, and Federico Alberto Pozzi. "Expressive signals in social media languages to improve polarity detection." Information Processing & Management 52, no. 1 (2016): 20-35. Syntax: Pinter et al. (2016)Pinter, Yuval, Roi Reichart, and Idan Szpektor. "Syntactic parsing of web queries with question intent." In Proceedings of NAACL-HLT, pp. 670-680. 2016. Event: Tan (2017)Tan, Luchen. "Tracking Events in Social Media." (2017). Speech Automatic speech recognition output and dependency parsing, SRL: Favre et al. (2010)Favre, Benoit, Bernd Bohnet, and Dilek Hakkani-Tür. "Evaluation of semantic role labeling and dependency parsing of automatic speech recognition output." In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 5342-5345. IEEE, 2010., Shrestha et al. (2015)Shrestha, Niraj, Ivan Vulic, and Marie-Francine Moens. "Semantic role labeling of speech transcripts." In Lecture Notes in Computer Science, vol. 9042, pp. 583-595. Springer, 2015. Entity linking: Benton and Dredze (2015)Benton, Adrian, and Mark Dredze. "Entity Linking for Spoken Language." In HLT-NAACL, pp. 225-230. 2015. Solution Normalization Liu et al. (2012)F. Liu, F. Weng, and X. Jiang. A broad-coverage normalization system for social media language. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pages 1035–1044, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. , Han et al. (2012)B. Han, P. Cook, and T. Baldwin. Automatically constructing a normalisation dictionary for microblogs. In Proceedings of the 2012 Joint Conference on Empiri- cal Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pages 421–432, Stroudsburg, PA, USA, 2012. As- sociation for Computational Linguistics. , Yang and Eisenstein (2013)Y. Yang and J. Eisenstein. A log-linear model for unsupervised text normalization. In EMNLP, pages 61–72, 2013. , Li and Liu (2014)C. Li and Y. Liu. Improving text normalization via unsupervised model and dis- criminative reranking. ACL 2014, page 86, 2014. Ruiz et al. (2014, for Spanish)Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. "Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models." Procesamiento del Lenguaje Natural (2014): 8. Limsopatham and Collier (2016)Limsopatham, Nut, and Nigel Collier. "Normalising medical concepts in social media texts by learning semantic representation." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 1. 2016. Čibej et al. (2016, for Slovene)Čibej, Jaka, Darja Fišer, and Tomaž Erjavec. "Normalisation, tokenisation and sentence segmentation of Slovene tweets." Proceedings of Normalisation and Analysis of Social Media Texts (NormSoMe) (2016): 5-10. From Baldwin and Li (2015)Baldwin, Tyler, and Yunyao Li. "An In-depth Analysis of the Effect of Text Normalization in Social Media." In HLT-NAACL, pp. 420-429. 2015.: "In this work we build a taxonomy of normalization edits and present a study of normalization to examine its effect on three different downstream applications (dependency parsing, named entity recognition, and text-to-speech synthesis). The results suggest that how the normalization task should be viewed is highly dependent on the targeted application. The results also show that normalization must be thought of as more than word replacement in order to produce results comparable to those seen on clean text." Domain adaptation ... Datasets * TED talk treebank: Neubig et al. (2014)Neubig, Graham, Katsuhito Sudoh, Yusuke Oda, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. "The naist-ntt ted talk treebank." In International Workshop on Spoken Language Translation. 2014. * English Web Treebank (LDC), ** see Silveira et al. (2014)Silveira, Natalia, Timothy Dozat, Marie-Catherine De Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer, and Christopher D. Manning. "A Gold Standard Dependency Corpus for English." In LREC, pp. 2897-2904. 2014., ** denoised by Joachim and van der Goot (2016)Daiber, Joachim, and Rob van der Goot. "The denoised web treebank: Evaluating dependency parsing under noisy input conditions." LREC, 2016., ** see also SANCL shared task? * Treebank of Learner English * Overview of the 2012 Shared Task on Parsing the Web (Petrov and McDonald, 2012)Petrov, Slav, and Ryan McDonald. "Overview of the 2012 shared task on parsing the web." In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL), vol. 59. 2012. * WikiDisc (French): Ho-dac and Laippala (2017)Ho-Dac, L. M., and Veronika Laippala. "Le corpus WikiDisc: ressource pour la caractérisation des discussions en ligne." (2017): 107-124. Normalization German: Bartz et al. (2013)Bartz, T., Beißwenger, M., and Storrer, A. (2013). Optimierung des Stuttgart-Tubingen-Tagset f ¨ ur die linguis- ¨ tische Annotation von Korpora zur internetbasierten Kommunikation: Phanomene, Herausforderungen, Er- ¨ weiterungsvorschlage. ¨ JLCL, 28(1):157–198., Sidarenka et al. (2013)Sidarenka, U., Scheffler, T., and Stede, M. (2013). Rule-based normalization of German Twitter messages. In Proceedings of the GSCL Workshop Verarbeitung und Annotation von Sprachdaten aus Genres internetbasierter Kommunikation., Laarmann-Quante and Dipper (2016)Laarmann-Quante, Ronja, and Stefanie Dipper. "An Annotation Scheme for the Comparison of Different Genres of Social Media with a Focus on Normalization." In Normalisation and Analysis of Social Media Texts (NormSoMe) Workshop Programme, p. 23. LREC 2016. Open-source software * ClearNLP (Choi 2012)Choi, Jinho D. "Optimization of natural language processing components for robustness and scalability." (2012). -- POS tagging, syntactic parsing and SRL, tested on OntoNotes ** Now is NLP4J and SRL is gone? See also * Non-canonical language: Plank et al. (2015)Plank, B., Martinez Alonso, H., & Søgaard, A. (2015). Non-canonical language is not harder to annotate than canonical language. Proceedings of the 9th Linguistic Annotation Workshop (LAW IX), 148–151. argue that it's not harder to annotate * Book: Farzindar & Inkpen (2015). Natural language processing for social media.''Farzindar, A., & Inkpen, D. (2015). ''Natural language processing for social media. Synthesis Lectures on Human Language Technologies (Vol. 8). Morgan & Claypool Publishers. * Semantic role labeling for spoken language References Category:Robustness